17 research outputs found

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.</p> <p>Results</p> <p>From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical <it>in vitro </it>binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.</p> <p>Conclusions</p> <p>The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.</p

    Annotated chemical patent corpus: A gold standard for text mining

    Get PDF
    Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, t

    Discovery of Highly Isoform Selective Orally Bioavailable Phosphoinositide 3-Kinase (PI3K)-γ Inhibitors

    Get PDF
    In this paper, we describe the discovery and optimization of a new chemotype of isoform selective PI3Kγ inhibitors. Starting from an HTS hit, potency and physicochemical properties could be improved to give compounds such as 15, which is a potent and remarkably selective PI3Kγ inhibitor with ADME properties suitable for oral administration. Compound 15 was advanced into in vivo studies showing dose-dependent inhibition of LPS-induced airway neutrophilia in rats when administered orally

    Nonadditivity in public and inhouse data:implications for drug design

    No full text
    Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity and additivity of substituent contributions. These techniques are challenged by nonadditivity (NA) in protein–ligand binding where the change of two functional groups in one molecule results in much higher or lower activity than expected from the respective single changes. Identifying nonlinear cases and possible underlying explanations is crucial for a drug design project since it might influence which lead to follow. By systematically analyzing all AstraZeneca (AZ) inhouse compound data and publicly available ChEMBL25 bioactivity data, we show significant NA events in almost every second assay among the inhouse and once in every third assay in public data sets. Furthermore, 9.4% of all compounds of the AZ database and 5.1% from public sources display significant additivity shifts indicating important SAR features or fundamental measurement errors. Using NA data in combination with machine learning showed that nonadditive data is challenging to predict and even the addition of nonadditive data into training did not result in an increase in predictivity. Overall, NA analysis should be applied on a regular basis in many areas of computational chemistry and can further improve rational drug design. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00525-z

    Visualisation and exploitation of the chemical space covered by patents

    No full text

    Exploring binding mechanisms in nuclear hormone receptors by Monte Carlo and x-ray-derived motions

    No full text
    In this study, we performed an extensive exploration of the ligand entry mechanism for members of the steroid nuclear hormone receptor family (androgen receptor, estrogen receptor α, glucocorticoid receptor, mineralocorticoid receptor, and progesterone receptor) and their endogenous ligands. The exploration revealed a shared entry path through the helix 3, 7, and 11 regions. Examination of the x-ray structures of the receptor-ligand complexes further showed two distinct folds of the helix 6-7 region, classified as "open" and "closed", which could potentially affect ligand binding. To improve sampling of the helix 6-7 loop, we incorporated motion modes based on principal component analysis of existing crystal structures of the receptors and applied them to the protein-ligand sampling. A detailed comparison with the anisotropic network model (an elastic network model) highlights the importance of flexibility in the entrance region. While the binding (interaction) energy of individual simulations can be used to score different ligands, extensive sampling further allows us to predict absolute binding free energies and analyze reaction kinetics using Markov state models and Perron-cluster cluster analysis, respectively. The predicted relative binding free energies for three ligands binding to the progesterone receptor are in very good agreement with experimental results and the Perron-cluster cluster analysis highlighted the importance of a peripheral binding site. Our analysis revealed that the flexibility of the helix 3, 7, and 11 regions represents the most important factor for ligand binding. Furthermore, the hydrophobicity of the ligand influences the transition between the peripheral and the active binding site
    corecore